Skip to content

feat(client): add Dynamo inference backend#2773

Open
biswapanda wants to merge 8 commits into
PrimeIntellect-ai:mainfrom
biswapanda:dynamo-integration
Open

feat(client): add Dynamo inference backend#2773
biswapanda wants to merge 8 commits into
PrimeIntellect-ai:mainfrom
biswapanda:dynamo-integration

Conversation

@biswapanda

@biswapanda biswapanda commented Jun 11, 2026

Copy link
Copy Markdown

Overview:

Adds NVIDIA Dynamo as an optional inference backend alongside the existing vLLM path. Controlled by a new ClientConfig.backend field ("vllm" | "dynamo"). Three self-contained changes: a pluggable AdminAPI abstraction, renderer_transport selection for the verifiers wire shape, and a Dynamo teacher-logprobs path for OPD training.

Details:

packages/prime-rl-configs/src/prime_rl/configs/shared.py

  • ClientConfig.backend: Literal["vllm", "dynamo"] — selects the AdminAPI implementation and verifiers wire shape. Default "vllm" is a no-op for existing configs.
  • ClientConfig.rl_base_url — optional override for the Dynamo RL worker discovery listener (GET /v1/rl/workers). When unset, the port is derived from DYN_RL_PORT (default 8001).

src/prime_rl/utils/client.py

  • AdminAPI Protocol + VLLMAdminAPI — extracts the existing vLLM admin paths (/pause, /resume, /update_weights, /load_lora_adapter, /init_broadcaster) into a typed protocol. VLLMAdminAPI methods go through a shared _admin_post helper that adds bounded per-attempt timeouts and tenacity retry on 5xx/transport errors (300 s for pause/resume, 720 s for weight updates).
  • DynamoAdminAPI — Dynamo worker admin over POST /engine/<method>: pause_generation, resume_generation, update_weights_from_disk / update_weights_from_distributed (filesystem vs NCCL paths), load_lora_adapter. Inherits health/model checks from VLLMAdminAPI.
  • setup_admin_api(client_config) — picks DynamoAdminAPI when backend="dynamo", VLLMAdminAPI otherwise.
  • discover_dynamo_admin_base_urls — resolves worker system URLs from GET /v1/rl/workers; falls back to port-replaced base_url when rl_base_url is unset.
  • setup_clients — sets renderer_transport="dynamo_chat" on all vf.ClientConfig objects when backend="dynamo", "vllm_generate" otherwise. Requires verifiers #1574 + renderers #79.

src/prime_rl/orchestrator/utils.py

  • Splits compute_teacher_logprobs into two paths dispatched on client_config.renderer_transport: _compute_teacher_logprobs_vllm (existing /inference/v1/generate path) and _compute_teacher_logprobs_dynamo (POST /v1/chat/completions with nvext.token_data + nvext.extra_fields=["prompt_logprobs"]).
  • _flatten_prompt_logprobs — shared flattener that handles both vLLM typed Logprob objects and Dynamos dict shape {logprob, rank?, decoded_token?}.

Where should the reviewer start?

  1. src/prime_rl/utils/client.pyAdminAPI protocol (line ~32), DynamoAdminAPI class, setup_admin_api, and setup_clients renderer_transport selection. Core of the change.
  2. src/prime_rl/orchestrator/utils.py_compute_teacher_logprobs_dynamo and the compute_teacher_logprobs dispatcher. Note the placeholder messages field required by the Dynamo frontend even when nvext.token_data is set.
  3. packages/prime-rl-configs/src/prime_rl/configs/shared.py — the two new ClientConfig fields; verify defaults are backward-compatible.

Related Issues:

  • Relates to verifiers #1574 — adds renderer_transport field to vf.ClientConfig
  • Relates to renderers #79 — adds dynamo_chat transport to renderers.generate()

Note

Medium Risk
Changes the weight-update and NCCL initialization paths when backend=dynamo, but default vllm behavior is preserved; misconfigured discovery or engine RPC could break training on Dynamo deployments.

Overview
Adds NVIDIA Dynamo as an optional inference backend via ClientConfig.backend ("vllm" | "dynamo", default unchanged) and optional rl_base_url for RL worker discovery.

Admin layer: Inference admin is refactored behind an AdminAPI protocol with VLLMAdminAPI (existing /pause, /update_weights, etc.) and DynamoAdminAPI (POST /engine/*, filesystem vs NCCL weight updates, LoRA via load_lora). Health, model checks, weight updates, LoRA load, and NCCL init all route through the selected implementation.

Dynamo wiring: When admin_base_url is unset, worker system URLs are discovered from GET /v1/rl/workers (port from rl_base_url or DYN_RL_PORT). Static pools retry discovery in wait_for_ready; elastic pools pin each pod’s admin client to the matching system_url by IP/DNS.

Rollouts & OPD: setup_clients sets renderer_transport to "dynamo" for the nvext wire shape. compute_teacher_logprobs dispatches to vLLM /inference/v1/generate or Dynamo chat completions with nvext.token_data. The orchestrator passes weight_broadcast.type into DynamoAdminAPI for NCCL vs disk updates.

Elastic: Separate model HTTP clients on the OpenAI-compat URL while admin hits the system server; backend is preserved when rebuilding train clients.

Reviewed by Cursor Bugbot for commit 3c41ee3. Bugbot is set up for automated code reviews on this repo. Configure here.

@biswapanda biswapanda changed the title Dynamo integration feat(client): add Dynamo inference backend — AdminAPI, renderer transport, teacher logprobs Jun 11, 2026
Comment thread src/prime_rl/utils/client.py
Comment thread src/prime_rl/utils/client.py
Comment thread src/prime_rl/utils/client.py
Comment thread src/prime_rl/utils/client.py
Comment thread src/prime_rl/utils/client.py
Comment thread src/prime_rl/utils/client.py
@biswapanda biswapanda changed the title feat(client): add Dynamo inference backend — AdminAPI, renderer transport, teacher logprobs feat(client): add Dynamo inference backend Jun 11, 2026
Comment thread src/prime_rl/utils/elastic.py
Comment thread src/prime_rl/utils/elastic.py
Comment thread src/prime_rl/utils/elastic.py Outdated

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a31c60b. Configure here.

Comment thread src/prime_rl/utils/client.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant